Summary
Objectives: Component-wise boosting algorithms have evolved into a popular estimation scheme
in biomedical regression settings. The iteration number of these algorithms is the
most important tuning parameter to optimize their performance. To date, no fully automated
strategy for determining the optimal stopping iteration of boosting algorithms has
been proposed.
Methods: We propose a fully data-driven sequential stopping rule for boosting algorithms.
It combines resampling methods with a modified version of an earlier stopping approach
that depends on AIC-based information criteria. The new “subsampling after AIC” stopping
rule is applied to component-wise gradient boosting algorithms.
Results: The newly developed sequential stopping rule outperformed earlier approaches if applied
to both simulated and real data. Specifically, it improved purely AIC-based methods
when used for the microarray-based prediction of the recurrence of meta-stases for
stage II colon cancer patients.
Conclusions: The proposed sequential stopping rule for boosting algorithms can help to identify
the optimal stopping iteration already during the fitting process of the algorithm,
at least for the most common loss functions.
Keywords
Gradient boosting - resampling methods - early stopping - variable selection - penalized
regression